Analysis of Melting Point Models of Benzene Derivatives
Introduction
Melting is the phase transition from a crystalline solid to a liquid [Brown2000]. Melting point is a bulk property of molecular substances that is defined as the temperature at which solid and liquid phases are in equilibrium at a certain pressure [Slovokhovtov2007, Brown2000]. This property is difficult to predict accurately because of the significant affect of molecular symmetry and intermolecular forces [Katritzky2001]. QSPR, or quantitative structure-property relationships, is a method that is typically used for melting point analysis [Katritzky2010]. This method develops a linear model using various descriptors [Katritzky2010]. Descriptors are mathematical values used to describe physical and chemical properties [Katritzky2010]. There are several methods and software that can be used to develop QSPR models and there are ways of statistically analyzing these models [Katritzky2010, Godavarthy2006, Katritzky2010, Trohalaki2005, Dearden2003, Palmer2007, Breiman2001]. QSPR modeling has been used for developing and analyzing melting point models of benzene derivatives [Abramowitz1990, Martin1979, Katritzky1997, Dearden1988].
Factors Affecting Melting Point
Melting point is determined by the strength of the crystal lattice of the molecule [Katritzky2001]. By definition, melting point can be related to the enthalpy change of fusion and the entropy change of fusion [Brown2000]: MP=ΔH_fus/ΔS_fus (Eq.1)
Enthalpy change of fusion is the amount of energy needed for the phase transition from the solid phase to the liquid phase and entropy of fusion is the amount of disorder of the molecules within the substance [Brown2000]. Based on this relationship, the melting point will be increased by any factor that either increases the enthalpy of fusion or decreases the entropy of fusion [Abramowitz1990]. Entropy of fusion can be further defined as [Abramowitz1990]: ΔS_fus=ΔS_pos+ΔS_exp+ΔS_rot (Eq.2)
ΔS_pos is the positional entropy of fusion, which is related to the change in randomness from solid phase to the liquid phase [Abramowitz1990]. ΔS_exp is the entropy of expansion, which is calculated from the equation [Abramowitz1990]: ΔS_exp=Rln(V(f)_liq/V(f)_solid) (Eq.3)
In this equation, V(f) is the free volume [Abramowitz1990]. ΔS_rot is the entropy of rotation, which is the change from ordered arrangement of solids to random arrangement of liquids [Abramowitz1990]. Two factors that affect the entropy of fusion and therefore affect the strength of the crystal lattice are molecular symmetry and intermolecular forces [Katritzky2001].
The symmetry of a molecule affects the entropy of rotation, which also affects the entropy of fusion and melting point [Abramowitz1990]. The molecular rotational symmetry number, sigma, is defined as the number of identical positions that can be obtained from rotating the molecule about its center [Dannenfelser1993]. When the value of sigma is higher, there is a higher probability of the molecule being in the correct orientation to fit into the crystal lattice [Dannenfelser1993]. A more symmetrical molecule has a lower rotational entropy of fusion and therefore, it melts at a higher temperature [Abramowitz1990]. Carnelley’s rule states that compounds of high molecular symmetry have high melting points [Brown2000]. This rule is valid for ordered crystals of rigid molecules, disordered crystals of rigid molecules, crystals with intermolecular hydrogen bonds, crystals of nonrigid molecules, and even molten salts [Brown2000]. Symmetry is applicable to melting point, but it does not affect the boiling point of a molecule [Abramowitz1986]. Therefore, there is a relationship between symmetry and melting point.
Intermolecular forces, or forces of attraction between molecules, also affect the melting point of the molecule by affecting the strength of the crystal lattice [Katritzky2001]. Molecules with stronger intermolecular forces have higher melting points because of the higher strength of the crystal lattice [Katritzky2001]. The dominant intermolecular force that affects the strength of the crystal lattice for organic compounds is hydrogen bonding [Katritzky2001]. The contribution of hydrogen bonding to intermolecular forces depends on the strength and number of hydrogen bonds [Abramowitz1986]. There are several factors that influence hydrogen bonding, including the maximum possible number of hydrogen bonds, the number of possible donor atoms, the number of possible acceptor atoms, the distance between the acceptor and donor, the angle between the acceptor and donor, and the number of possible intramolecular hydrogen bonding that competes with intermolecular hydrogen bonding [Martin1979, Abramowitz1986]. Typically, molecules with high intramolecular forces, which are the forces of attraction within a molecule, have lower intermolecular forces and therefore, have lower melting points [Katritzky2001].
In the absence of hydrogen bonds and other strong intermolecular forces, van der Waals intermolecular forces affect the strength of the crystal lattice [Slovokhotov2007]. These forces are weak and therefore, the crystal lattice is weak and a low melting point is expected [Slovokhotov2007]. Van der Waals intermolecular forces include dipolar, induction, and dispersion forces [Abramowitz1986]. There are several descriptors that contribute to these forces, including ionization potential, polarizability, separation distance, and dipole moment [Abramowitz1986]. For determining the melting point of a organic compound, the difference in intermolecular forces between the solid and liquid is important, but the total intermolecular force is not important [Abramowitz1986]. Ionization potential, polarizability, and separation distance can be used to find the difference in dispersion energies between the solid and liquid [Abramowitz1986]. Polarizability, dipole moment, and separation distance can be used to find the difference between induction forces of the solid and liquid [Abramowitz1986]. The dipole moment and separation distance can be used to find the difference in dipolar forces between the solid and liquid [Abramowitz1986]. Molecular rotational symmetry and intermolecular forces affect the melting point and therefore, these factors should be considered for melting point prediction.
Descriptors
The affect of molecular symmetry and intermolecular forces on melting point makes it difficult to derive methods that can predict the melting point of a wide variety of organic compounds with high accuracy. Several descriptors have been created to put values on these effects in an attempt to predict melting point values. Molecular symmetry has been quantified by descriptors, including EXPAN, ORTHO, sigma, and SIGMAL. EXPAN is a descriptor that is defined as the eccentricity cubed, where eccentricity is the maximum molecular length divided by the mean molecular diameter [Abramowitz1990]. This relationship can be represented by the equation [Abramowitz1990]: EXPAN = ((pi*length)^3) / 6*(volume) (Eq.4)
ORTHO is a descriptor that is defined as the number of functional groups that are ortho, or adjacent, to another group [Abramowitz1990]. For this descriptor, fluorine is equivalent to hydrogen and therefore, fluorine does not contribute to ortho interactions [Abramowitz1990]. Sigma is the rotational symmetry number that is calculated as the number of orientations of the molecule that are identical to the original orientation from a certain position [Abramowitz90].
There are several discrepancies in the definition of sigma. Abramowitz and Yalkowsky define sigma as the molecular rotational symmetry number taking the size of substituted groups into account [Abramowitz1990]. Based on this definition, fluoro is equivalent to hydrogen and bromo, chloro, cyano, and nitro are equivalent to methyl [Abramowitz1990]. For example, ortho-, meta-, and para-bromochlorobenzene have sigma values of 2, 2, and 4, respectively [Abramowitz1990]. This version of the descriptors will be referred to as SIGMALabramowitz and sigmaAbramowitz. Abramowitz justifies his system of equal substituents because it is possible for a molecule without a symmetry center to form centrosymmetric crystals based on the positional entropy of the crystals [Abramowitz1986]. These disordered rigid structures can be proven through x-ray diffraction analysis [Abramowitz1986]. Abramowitz also uses a variation of sigma called SIGMAL, which is a descriptor that is defined as the logarithm of sigma [Abramowitz1990]. Many other researchers use a definition of sigma that is different than the definition Abramowitz describes. Martin, Yalkowsky, and Wells define sigma as the symmetry number not taking the size of substituted groups into account and therefore, each substituent group is considered different [Martin1979]. For example, ortho-, meta-, and para-bromochlorobenzene have sigma values of 1, 1, and 2, respectively [Martin1979]. This version of the descriptors will be referred to as SIGMALmartin and sigmaMartin. Dannenfelser, Surendran, and Yalkowsky agree with the definition of sigma that is described by Martin, Yalkowsky, and Wells [Mishra1991].
Other descriptors have been used to predict melting points because of their contribution to intermolecular forces [Abramowitz1986]. Descriptors including HNO and IHB are significant because of their influence on hydrogen bonding [Martin1979]. HNO is a descriptor that is defined as the maximum possible number of hydrogen bonds and is also known to affect the entropy of fusion [Martin1979]. This descriptor can be predicted from molecular structure by calculating two times the number of hydrogen bond acceptor sites and two times the number of hydrogen bond donor sites and using the smaller of the two values [Martin1979]. IHB is a descriptor that is defined as the number of possible internal, or intramolecular, hydrogen bonding that competes with external, or intermolecular, hydrogen bonding [Martin1979].
Several other descriptors contribute to intermolecular forces because of their influence on van der Waals interactions [Abramowitz1986]. These descriptors include dipole moment, quadrapole moment, polarizability, ionization potential, and intermolecular separation distance [Abramowitz1986, Martin1979]. Dipole moment is a descriptor that is known to affect the entropy of fusion and can be predicted from molecular structure [Martin1979]. This descriptor is typically abbreviated either DM or u [Martin1979, Abramowitz1986]. The net dipole moment is a measure of a molecule’s ability to electrostatically interact with neighboring dipoles [Abramowitz1986]. It also is a measure of the ability of a molecule to induce a dipole in a neighboring molecule [Abramowitz1986]. Dipole moment is a significant contribution to dipolar forces and induction forces [Abramowitz1986]. It is important to understand that at close distances of separation, a scalar sum of the individual group dipole moments is more reasonable for correlating melting point models [Abramowitz1986]. A variation on dipole moment is quadrapole moment, which is represented by QM [Abramowitz1986]. Quadrapole moment is a descriptor that accounts for electronic effects and is defined as the sum of the charges multiplied by the squared distance between the charges [Martin1979]. Based on this definition, QM can be calculated by the sum of the squares of the dipole moments of each substituent group [Martin1979]. Polarizability is descriptor that is often represented by alpha or a, and is defined as the ability of the molecule to interact with dipoles [Martin1979]. The polarizability of a molecule is a measure of the size of the molecule’s electron cloud [Abramowitz1986]. Often, it is proportional to the volume of the molecule [Abramowitz1986]. Alpha is the sum of polarizability of each group in the molecule [Martin1979]. Group values for C6H4, methyl, chloro, bromo, nitro, amine, hydroxy, and carboxyl are 10, 2.2, 2.3, 3.3, 2.6, 1.7, 1.0, and 2.8, respectively [Martin1979]. Polarizability significantly contributes to induction and dispersion forces [Abramowitz1986]. Ionization potential, which is represented by I, is a descriptor defined as the energy required to remove an electron from a molecule [Abramowitz1986]. This descriptor minimally affects dispersion forces and it does not affect dipolar or induction forces [Abramowitz1986]. Intermolecular separation distance, which is abbreviated as r, is a non-molecular parameter that depends on packing efficiency [Abramowitz1986]. For benzenes, the intermolecular separation distance is the distance between interacting substituent groups [Abramowitz1986]. This parameter has major effects on van der Waals forces [Abramowitz1986]. Generally, intermolecular separation distances are smaller in a solid than a liquid [Abramowitz1986].
Other physical properties can be used as descriptors for predicting melting point. Boiling point is a bulk property of molecular substances that is defined as the temperature at which the molecule can escape from the liquid and enter the vapor phase [Dearden2003]. Boiling point is a physical property that has been analyzed to determine its relationship with melting point [Abramowitz1990]. This property can be used in the prediction of melting point values [Abramowtiz1990]. Methods prove that boiling point can be estimated using additive constitutive properties, which are properties that can be added based on individual groups or atoms and that are dependent on structure [Simamora1993, Abramowitz1990]. Melting point prediction requires both additive constitutive properties and non-additive non-constituted properties, such as rotational symmetry [Simamora1993]. Therefore, boiling point can be used for the prediction of melting point as an additive constitutive property [Simamora1993, Abramowitz1990].
Development and Analysis of QSPR Models
QSPR, or quantitative structure-property relationships, are structure-based models that have been developed to overcome the problems and difficulties with group contribution methods [Godavarthy2006]. QSPR models are developed by linear models and descriptors that relate physical and chemical properties to molecular structures [Godavarthy2006, Katritzky2010]. There are two major aspects of QSPR modeling: representation of chemical parameters and statistical analysis [Katritzky2010]. To develop a QSPR model, a training set must be established, the molecular structure must be defined, descriptors must be generated and analyzed for importance, and then a linear model must be created and analyzed [Katritzky2010]. A training set is a random selection of a certain percent of the original test set that is used to generate descriptors [Breiman2001, Katritzky2010].
There are several methods and software for developing and analyzing descriptors. One software program that is used extensively is CODESSA [Katritzky2002A, Godavarthy2006, Katritzky1997, Katritzky2002B]. CODESSA is a program that can provide correlations and understanding about important descriptors that affect physical properties, such as melting point [Katritzky2002]. The program uses the best multilinear regression method, or BMLR, to find the best multilinear correlations [Katritzky2002A]. It also eliminates descriptors that are not available for some molecules [Katritzky2002A]. Using this method, the descriptors are ordered by decreasing correlation coefficient and then, the top descriptors are correlated with the remaining descriptors to determine the best pairs of descriptors [Katritzky2002A). Other software programs that are used for QSPR modeling are AMPAC, DRAGON, POLLY, ADAPT, OASIS, Chem-X, Tsar, QSAR-Model [Godavarthy2006, Katritzky2010, Trohalaki2005, Dearden2003].
Another useful method for developing and analyzing descriptors is a Random Forest, which is a method of classification and regression [Palmer2007]. This method is very useful for QSPR modeling because it has high accuracy for prediction, it has a method of descriptor selection, and it has a method for determining the significance of each descriptor for the data set [Palmer2007]. The Random Forest method is based on growing an ensemble of decision trees and having them evaluate the significance of descriptors [Breiman2001]. The ensemble of trees is created by random vectors that control the growth of each tree [Breiman2001]. Below is an example of a random forest: Figure 1: An example of a Random Forest plot [Breiman2001]
The graph shows how the Random Forest sorts variables, or descriptors, in order of importance [Breiman2001]. The most important descriptors for the data set are in the upper right corner of the graph [Breiman2001]. This method is useful for determining the most important descriptors and then, the descriptors can be applied to QSPR modeling [Palmer2007].
After the descriptors are created and selected, a QSPR model is developed and then statistically analyzed [Katritzky2010]. Statistical analysis must be applied to a QSPR melting point model to determine the results. To determine the accuracy of the model, a plot of predicted melting point values versus experimental melting points can be used [Marcus1998]. The best-fit linear model on this plot has a correlation, or r-value, which is typically reported as R^2 [Katritzky2010]. This regression line determines the coefficients of the descriptors used for the model [Katritzky2010]. An example of this type of plot is depicted below [Katritzky2010]: Figure 2: Predicted Melting Point vs. Experimental Melting Point Plot [Katritzky2010]
The accuracy of a QSPR model is quantified by root mean square error (RMSE/RMSEP), residual standard deviation (RSD), or predictive squared correlation coefficient (Q^2) [Katritzky2010]. Average error, or AAE is also a statistic that can be used [Katritzky2010]. The predictive ability of a QSPR model is typically quantified by RMSEP [Faber1999]. There are two methods for estimating RMSEP, including the approximate sampling distribution of mean square and error propagation [Faber1999]. These statistical analyses are an essential aspect of QSPR modeling because it is part of the development and validation of the model [Katritzky2010].
QSPR Models
QSPR models are useful for melting point prediction and analysis and have been used by various researchers to develop melting point models for benzene derivatives [Abramowitz1990, Martin1979, Katritzky1997, Dearden1988]. Abramowitz and Yalkowsky used QSPR as a method for testing the influence of additive constitutive properties on predicting melting point by comparing the relationship between melting point (in Kelvin) and boiling point (in Kelvin) using the equation [Abramowitz1990]: MP = 0.848*BP – 105.7 (Eq.5)
When applying this equation to 85 benzene derivatives, there was a correlation, or r-value, of 0.763 [Abramowitz1990]. This low correlation proves that melting point is influenced by more than just additive constitutive properties [Abramowitz1990].
After proving that melting point is influenced by non-additive non-constitutive properties, Abramowitz and Yalkowsky apply different descriptors to their data set using QSPR modeling [Simamora1993, Abramowitz1990]. Rotational symmetry is a non-additive non-constitutive property and therefore, Abramowitz and Yalkowsky use SIGMALabramowitz along with the additive constitutive property, boiling point, to predict melting point with the equation [Simamora1993, Abramowitz1990]: MP = 0.952*BP + 113.0*SIGMALabramowitz – 206.2 (Eq.6)
When applying this equation to their data set, there was a correlation of 0.908 [Abramowitz1990]. This r-value proves the importance of SIGMALabramowitz for estimating melting point.
Abramowitz and Yalkowsky also tested a four-parameter equation to estimate melting point values of their data set [Abramowitz1990]. These parameters include boiling point, SIGMALabramowitz, EXPAN, and ORTHO [Abramowitz1990]. The equation they developed is [Abramowitz1990]: MP = 0.772*BP + 110.8*SIGMALabramowitz +11.56*ORTHO + 31.9*EXPAN – 234.4 (Eq.7)
This equation had a correlation of 0.938 on their data set [Abramowitz1990]. This correlation is slightly better than the previous correlation with the two-parameter equation [Abramowitz1990].
Abramowitz and Yalkowsky tested an equation that only included non-additive non-constitutive properties, SIGMALabramowitz, ORTHO, and EXPAN, and not the additive constitutive property, boiling point, to predict melting point [Abramowitz90]. By testing an equation with only non-additive non-constitutive properties, Abramowitz and Yalkowsky were testing the importance of additive constitutive properties for estimating melting point [Abramowitz1990]. The equation with these descriptors is [Abramowitz1990]: MP = 86.5*SIGMALabramowitz + 29.8*ORTHO + 63.0*EXPAN + 20.5 (Eq.8)
The equation had a correlation of 0.767 on their data set [Abramowitz1990]. The low correlation proves the significance of both non-additive non-constitutive properties and additive constitutive properties for melting point estimation of benzene derivatives [Abramowitz1990].
The melting point models of Abramowitz and Yalkowsky are examples of QSPR. Several other methods using QSPR have been developed and tested for melting point prediction. One such melting point model is a model developed by Martin, Yalkowsky, and Wells [Martin1979]. These authors developed two models that could predict melting point in Celcius using the descriptors, sigma(martin), alpha, QM, HNO, IHB, and DM [Martin1979]. One equation they developed is [Martin1979]: MP = 29.7*sigma(martin) + 28.7*alpha + 5.4*QM + 41.8*HNO – 28.5*IHB – 10.2*DM – 492 (Eq.9)
This equation has a correlation, or r-value, of 0.892 for their data set consisting of 84 disubstituted benzene derivatives [Martin1979]. Another equation they developed only uses the descriptors, sigma(martin), alpha, QM, and HNO [Martin1979]: MP = 36.9*sigma(martin) + 30.2*alpha + 4.07*QM + 40.4*HNO – 535 (Eq.10)
This equation has a correlation of 0.882 for the same data set [Martin1979]. Therefore, the six-parameter equation has only a slightly better correlation than the four-parameter equation [Martin1979].
Katritzky, Maran, Karelson, and Labanov developed several melting point models for benzene derivatives using QSPR [Katritzky1997]. The first model is a nine-parameter equation for a data set of 443 substituted benzenes, including 131 ortho-substituted benzenes, 124 meta-substituted benzenes, and 155 para-substituted benzenes [Katritzky1997]. This model uses descriptors, such as HDSA2, av valency of H atom, TMSA, first order av structural information content, second order av information content, maximum total interaction for a C-H bond, av nucleophile reaction index for a C atom, Beta polarizibility, and sigma(martin) [Katritzky1997]. This model had a correlation, or r-value of 0.915 [Katritzky1997]. Three additional six-parameter models were created the ortho-substituted, meta-substituted, and para-substituted subsets, respectively [Katritzky1997]. The ortho-substituted benzene data set includes descriptors, such as HDSA2, ALFA polarizability, av valency of H atom, maximum total interaction for a C-H bond, ZX shadow, and maximum partial charge for a O atom and the model has an r-value of 0.922 for the data set [Katritzky1997]. The meta-substituted benzene data set includes descriptors, such as av valency of H atom, av nucleophile reaction index for C atom, HDSA2, sigma(martin), LUMO energy, and DPSA2 difference in CPSAs [Katritzky1997]. The correlation for this model was 0.923 for the data set [Katritzky1997]. The para-substituted benzene data set uses descriptors, such as XY shadow, new H-acceptors PSA, minimum e-e repulsion for H atom, HDSA2, second order av information content, and av valency H atom and the correlation for this model was 0.929 for the data set [Katritzky1997]. The researchers used software to determine which descriptors have the highest significance and are most useful for obtaining a high correlation value for the data sets [Katritzky1997]. It would be difficult to replicate this model because software is required to obtain values for many of these descriptors [Katritzky1997].
Another QSPR model is a model by Dearden and Rahman [Dearden1988]. This model is a seven-parameter equation that was created based on a data set of 43 anilines [Dearden1988]. The equation uses descriptors, such as hydrogen bond donor ability (HD), Swain-Lunton field parameter (F), third order valence-corrected molecular connectivity, hydrogen bond acceptor ability (HA), p-substitution (IP), Swain-Lunton resonance parameter (R), and Sterimol length parameter (L) [Dearden1988]. This model resulted in a 0.94 correlation for the data set and when two outliers were removed, the new model had a correlation of 0.961 [Dearden1988].
Based on the various QSPR models, there are several obvious problems with this method of modeling. Each model has a high correlation for the data set it was created from, but none of these models were replicated or extended to bigger data sets. Also, many of the models are very specific for a certain group of organic molecules. There are models for only benzene derivatives, only ortho-substituted benzene derivatives, only meta-substituted benzene derivatives, only para-substituted benzene derivatives, and only anilines [Abramowitz1990, Dearden1988, Martin1979, Katritzky1997]. Even when looking beyond benzene derivatives, the models are very specific for data sets. There is a model that applies to only hydrocarbons [Tsakalotos1906] and a model that only applies to alkanes [Burch2004]. These problems have affected the overall progress in research of melting point models. Another problem with QSPR modeling is that it requires additional software or methods for molecular screening [Godavarthy2006]. Although this model is useful, it has some flaws.
Conclusion
QSPR modeling is a useful method for creating and analyzing melting point models of benzene derivatives. In the last twenty to thirty years, there have been advancements in melting point modeling, but researchers have all used different descriptors and methods to develop results. Comparison analyses of different melting point models of benzene derivatives could be very useful for further advancements in melting point modeling. Based on the significance of molecular symmetry and intermolecular forces on melting point, further research and analysis should also be done on these relationships.
References
Abramowitz, R. Estimation of the Melting Point of Rigid Organic Compounds. ProQuest Dissertations and Theses. 225, (1986). [DOI]
Abramowitz, R., Yalkowsky, S.H. Melting Point, Boiling Point, and Symmetry. Pharmaceutical Research. 7(9), 942-947 (1990). [DOI]
Breiman, L. Random Forests. Machine Learning. 45(1), 5-32 (2001). [DOI]
Brown, R.J.C., Brown, R.F.C. Melting Point and Molecular Symmetry. Journal of Chemical Education. 77(6), 724 (2000). [DOI]
Burch, K.J., Whitehead, E.G. Melting-Point Models of Alkanes. Journal of Chemical Engineering Data. 49(4), 858-863 (2004). [DOI]
Dannenfelser, R.M., Surendran, N., Yalkowsky, S.H. Molecular Symmetry and Related Properties. Sar and QSAR in Environmental Research. 1(4), 273-292 (1993). [DOI] [Password Protected Zip]
Dearden, J.C., Quantitative Structure-Property Relationships for Prediction of Boiling Point, Vapor Pressure, and Melting Point. Environmental Toxicology and Chemistry. 22(8), 1696-1709 (2003). [DOI]
Dearden, J.C., Rahman, M.H. QSAR Approach to the Prediction of Melting Points of Substituted Anilines. Mathematical and Computer Modelling. 11(1), 843-846 (1988). [DOI]
Faber, N.M. Estimating the uncertainty in estimates of root mean square error of prediction: application to determining the size of an adequate test set in multivariate calibration. Chemometrics and Intelligent Laboratory Systems. 49(1), 79-89 (1999). [DOI]
Godavarthy, S.S., Robinson, R.L., Gasem, K.A.M. An Improved Structure-Property Model for Predicting Melting-Point Temperatures. Industrial & Engineering Chemistry Research. 45(14), 5117-5126 (2006). [DOI]
Katritzky, A.R., Jain, R., Lomaka, A., Petrukhin, R., Karelson, M., Visser, A.E., Rogers, R.D. Correlation of the Melting Points of Potential Ionic Liquids (Imidazolium Bromides and Benzimidazolium Bromides) Using the CODESSA Program. Journal of Chemical Information and Computer Sciences. 42(2), 225-231 (2002). [DOI]
Katritzky, A.R., Jain, R., Lomaka, A., Petrukhin, R., Maran, U., Karelson, M. Perspective on the Relationship between Melting Points and Chemical Structure. Crystal Growth & Design. 1(4), 261-265 (2001). [DOI]
Katritzky, A.R., Kuanar, M., Slavov, S., Hall, C.D., Karelson, M., Kahn, I., Dobchev, D. Quantitative Correlation of Physical and Chemical Properties with Chemical Structure: Utility for Prediction. Chemical Reviews. 110(10), 5714-5789 (2010). [DOI]
Katritzky, A.R., Lomaka, A., Petrukhin, R., Jain, R., Karelson, M., Visser, A.E., Rogers, R.D. QSPR Correlation of the Melting Point for Pyridinium Bromides, Potential Ionic Liquids. Journal of Chemical Information and Computer Sciences. 42(1), 71-74 (2002). [DOI]
Katritzky, A.R., Maran, U., Karelson, M., Lobanov, V.S. Prediction of Melting Points for the Substituted Benzenes: A QSPR Approach. Journal of Chemical Information and Computer Sciences. 37(5), 913-919 (1997) [DOI]
Marcus, A.H., Elias, R.W. Some Useful Statistical Methods for Model Validation. Environmental Health Perspectives. 106(6), 1541-1550 (1998). [DOI] [Password Protected Zip]
Martin, E., Yalkowsky, S.H., Wells, J.E. Fusion of Disubstituted Benzenes. Journal of Pharmaceutical Sciences. 68(5), 565-568 (1979). [DOI]
Mishra, D.S., Yalkowsky, S.H. Estimation of Vapor Pressure of Some Organic Compounds. Industrial & Engineering Chemistry Research. 30(7), 1609-1612 (1991). [DOI]
Palmer, D.S., O’Boyle, N.M., Glen, R.C., Mitchell, J.B.O. Random Forest Models to Predict Aqueous Solubility. Journal of Chemical Information and Modeling. 47(1), 150-158 (2007). [DOI]
Simamora, P., Miller, A.H., Yalkowsky, S.H. Melting Point and Normal Boiling Point Correlations: Applications to Rigid Aromatic Compounds. Journal of Chemical Information and Modeling. 33(3), 437-440 (1993). [DOI]
Slovokhotov, Y.L., Batsanov, A.S., Howard, J.A.K. Molecular van der Waals symmetry affecting bulk properties of condensed phases: melting and boiling points. Structural Chemistry. 18(4), 477-491 (2007). [DOI]
Trohalaki, S., Pachter, R., Drake, G.W., Hawkins, T. Quantitative Structure-Property Relationships for Melting Points and Densities of Ionic Liquids. Energy & Fuels. 19(1), 279-284 (2005). [DOI]
Tsakalotos, D.E. Sur le point de fusion des hydrocarbures homologues du methane. Comptes rendus de l’Academie des Sciences. Paris. 143, 1235-1236 (1906). [DOI] [Password Protected Zip]
Analysis of Melting Point Models of Benzene Derivatives
Introduction
Melting is the phase transition from a crystalline solid to a liquid [Brown2000]. Melting point is a bulk property of molecular substances that is defined as the temperature at which solid and liquid phases are in equilibrium at a certain pressure [Slovokhovtov2007, Brown2000]. This property is difficult to predict accurately because of the significant affect of molecular symmetry and intermolecular forces [Katritzky2001]. QSPR, or quantitative structure-property relationships, is a method that is typically used for melting point analysis [Katritzky2010]. This method develops a linear model using various descriptors [Katritzky2010]. Descriptors are mathematical values used to describe physical and chemical properties [Katritzky2010]. There are several methods and software that can be used to develop QSPR models and there are ways of statistically analyzing these models [Katritzky2010, Godavarthy2006, Katritzky2010, Trohalaki2005, Dearden2003, Palmer2007, Breiman2001]. QSPR modeling has been used for developing and analyzing melting point models of benzene derivatives [Abramowitz1990, Martin1979, Katritzky1997, Dearden1988].Factors Affecting Melting Point
Melting point is determined by the strength of the crystal lattice of the molecule [Katritzky2001]. By definition, melting point can be related to the enthalpy change of fusion and the entropy change of fusion [Brown2000]:MP=ΔH_fus/ΔS_fus (Eq.1)
Enthalpy change of fusion is the amount of energy needed for the phase transition from the solid phase to the liquid phase and entropy of fusion is the amount of disorder of the molecules within the substance [Brown2000]. Based on this relationship, the melting point will be increased by any factor that either increases the enthalpy of fusion or decreases the entropy of fusion [Abramowitz1990]. Entropy of fusion can be further defined as [Abramowitz1990]:
ΔS_fus=ΔS_pos+ΔS_exp+ΔS_rot (Eq.2)
ΔS_pos is the positional entropy of fusion, which is related to the change in randomness from solid phase to the liquid phase [Abramowitz1990]. ΔS_exp is the entropy of expansion, which is calculated from the equation [Abramowitz1990]:
ΔS_exp=Rln(V(f)_liq/V(f)_solid) (Eq.3)
In this equation, V(f) is the free volume [Abramowitz1990]. ΔS_rot is the entropy of rotation, which is the change from ordered arrangement of solids to random arrangement of liquids [Abramowitz1990]. Two factors that affect the entropy of fusion and therefore affect the strength of the crystal lattice are molecular symmetry and intermolecular forces [Katritzky2001].
The symmetry of a molecule affects the entropy of rotation, which also affects the entropy of fusion and melting point [Abramowitz1990]. The molecular rotational symmetry number, sigma, is defined as the number of identical positions that can be obtained from rotating the molecule about its center [Dannenfelser1993]. When the value of sigma is higher, there is a higher probability of the molecule being in the correct orientation to fit into the crystal lattice [Dannenfelser1993]. A more symmetrical molecule has a lower rotational entropy of fusion and therefore, it melts at a higher temperature [Abramowitz1990]. Carnelley’s rule states that compounds of high molecular symmetry have high melting points [Brown2000]. This rule is valid for ordered crystals of rigid molecules, disordered crystals of rigid molecules, crystals with intermolecular hydrogen bonds, crystals of nonrigid molecules, and even molten salts [Brown2000]. Symmetry is applicable to melting point, but it does not affect the boiling point of a molecule [Abramowitz1986]. Therefore, there is a relationship between symmetry and melting point.
Intermolecular forces, or forces of attraction between molecules, also affect the melting point of the molecule by affecting the strength of the crystal lattice [Katritzky2001]. Molecules with stronger intermolecular forces have higher melting points because of the higher strength of the crystal lattice [Katritzky2001]. The dominant intermolecular force that affects the strength of the crystal lattice for organic compounds is hydrogen bonding [Katritzky2001]. The contribution of hydrogen bonding to intermolecular forces depends on the strength and number of hydrogen bonds [Abramowitz1986]. There are several factors that influence hydrogen bonding, including the maximum possible number of hydrogen bonds, the number of possible donor atoms, the number of possible acceptor atoms, the distance between the acceptor and donor, the angle between the acceptor and donor, and the number of possible intramolecular hydrogen bonding that competes with intermolecular hydrogen bonding [Martin1979, Abramowitz1986]. Typically, molecules with high intramolecular forces, which are the forces of attraction within a molecule, have lower intermolecular forces and therefore, have lower melting points [Katritzky2001].
In the absence of hydrogen bonds and other strong intermolecular forces, van der Waals intermolecular forces affect the strength of the crystal lattice [Slovokhotov2007]. These forces are weak and therefore, the crystal lattice is weak and a low melting point is expected [Slovokhotov2007]. Van der Waals intermolecular forces include dipolar, induction, and dispersion forces [Abramowitz1986]. There are several descriptors that contribute to these forces, including ionization potential, polarizability, separation distance, and dipole moment [Abramowitz1986]. For determining the melting point of a organic compound, the difference in intermolecular forces between the solid and liquid is important, but the total intermolecular force is not important [Abramowitz1986]. Ionization potential, polarizability, and separation distance can be used to find the difference in dispersion energies between the solid and liquid [Abramowitz1986]. Polarizability, dipole moment, and separation distance can be used to find the difference between induction forces of the solid and liquid [Abramowitz1986]. The dipole moment and separation distance can be used to find the difference in dipolar forces between the solid and liquid [Abramowitz1986]. Molecular rotational symmetry and intermolecular forces affect the melting point and therefore, these factors should be considered for melting point prediction.
Descriptors
The affect of molecular symmetry and intermolecular forces on melting point makes it difficult to derive methods that can predict the melting point of a wide variety of organic compounds with high accuracy. Several descriptors have been created to put values on these effects in an attempt to predict melting point values. Molecular symmetry has been quantified by descriptors, including EXPAN, ORTHO, sigma, and SIGMAL. EXPAN is a descriptor that is defined as the eccentricity cubed, where eccentricity is the maximum molecular length divided by the mean molecular diameter [Abramowitz1990]. This relationship can be represented by the equation [Abramowitz1990]:EXPAN = ((pi*length)^3) / 6*(volume) (Eq.4)
ORTHO is a descriptor that is defined as the number of functional groups that are ortho, or adjacent, to another group [Abramowitz1990]. For this descriptor, fluorine is equivalent to hydrogen and therefore, fluorine does not contribute to ortho interactions [Abramowitz1990]. Sigma is the rotational symmetry number that is calculated as the number of orientations of the molecule that are identical to the original orientation from a certain position [Abramowitz90].
There are several discrepancies in the definition of sigma. Abramowitz and Yalkowsky define sigma as the molecular rotational symmetry number taking the size of substituted groups into account [Abramowitz1990]. Based on this definition, fluoro is equivalent to hydrogen and bromo, chloro, cyano, and nitro are equivalent to methyl [Abramowitz1990]. For example, ortho-, meta-, and para-bromochlorobenzene have sigma values of 2, 2, and 4, respectively [Abramowitz1990]. This version of the descriptors will be referred to as SIGMALabramowitz and sigmaAbramowitz. Abramowitz justifies his system of equal substituents because it is possible for a molecule without a symmetry center to form centrosymmetric crystals based on the positional entropy of the crystals [Abramowitz1986]. These disordered rigid structures can be proven through x-ray diffraction analysis [Abramowitz1986]. Abramowitz also uses a variation of sigma called SIGMAL, which is a descriptor that is defined as the logarithm of sigma [Abramowitz1990]. Many other researchers use a definition of sigma that is different than the definition Abramowitz describes. Martin, Yalkowsky, and Wells define sigma as the symmetry number not taking the size of substituted groups into account and therefore, each substituent group is considered different [Martin1979]. For example, ortho-, meta-, and para-bromochlorobenzene have sigma values of 1, 1, and 2, respectively [Martin1979]. This version of the descriptors will be referred to as SIGMALmartin and sigmaMartin. Dannenfelser, Surendran, and Yalkowsky agree with the definition of sigma that is described by Martin, Yalkowsky, and Wells [Mishra1991].
Other descriptors have been used to predict melting points because of their contribution to intermolecular forces [Abramowitz1986]. Descriptors including HNO and IHB are significant because of their influence on hydrogen bonding [Martin1979]. HNO is a descriptor that is defined as the maximum possible number of hydrogen bonds and is also known to affect the entropy of fusion [Martin1979]. This descriptor can be predicted from molecular structure by calculating two times the number of hydrogen bond acceptor sites and two times the number of hydrogen bond donor sites and using the smaller of the two values [Martin1979]. IHB is a descriptor that is defined as the number of possible internal, or intramolecular, hydrogen bonding that competes with external, or intermolecular, hydrogen bonding [Martin1979].
Several other descriptors contribute to intermolecular forces because of their influence on van der Waals interactions [Abramowitz1986]. These descriptors include dipole moment, quadrapole moment, polarizability, ionization potential, and intermolecular separation distance [Abramowitz1986, Martin1979]. Dipole moment is a descriptor that is known to affect the entropy of fusion and can be predicted from molecular structure [Martin1979]. This descriptor is typically abbreviated either DM or u [Martin1979, Abramowitz1986]. The net dipole moment is a measure of a molecule’s ability to electrostatically interact with neighboring dipoles [Abramowitz1986]. It also is a measure of the ability of a molecule to induce a dipole in a neighboring molecule [Abramowitz1986]. Dipole moment is a significant contribution to dipolar forces and induction forces [Abramowitz1986]. It is important to understand that at close distances of separation, a scalar sum of the individual group dipole moments is more reasonable for correlating melting point models [Abramowitz1986]. A variation on dipole moment is quadrapole moment, which is represented by QM [Abramowitz1986]. Quadrapole moment is a descriptor that accounts for electronic effects and is defined as the sum of the charges multiplied by the squared distance between the charges [Martin1979]. Based on this definition, QM can be calculated by the sum of the squares of the dipole moments of each substituent group [Martin1979]. Polarizability is descriptor that is often represented by alpha or a, and is defined as the ability of the molecule to interact with dipoles [Martin1979]. The polarizability of a molecule is a measure of the size of the molecule’s electron cloud [Abramowitz1986]. Often, it is proportional to the volume of the molecule [Abramowitz1986]. Alpha is the sum of polarizability of each group in the molecule [Martin1979]. Group values for C6H4, methyl, chloro, bromo, nitro, amine, hydroxy, and carboxyl are 10, 2.2, 2.3, 3.3, 2.6, 1.7, 1.0, and 2.8, respectively [Martin1979]. Polarizability significantly contributes to induction and dispersion forces [Abramowitz1986]. Ionization potential, which is represented by I, is a descriptor defined as the energy required to remove an electron from a molecule [Abramowitz1986]. This descriptor minimally affects dispersion forces and it does not affect dipolar or induction forces [Abramowitz1986]. Intermolecular separation distance, which is abbreviated as r, is a non-molecular parameter that depends on packing efficiency [Abramowitz1986]. For benzenes, the intermolecular separation distance is the distance between interacting substituent groups [Abramowitz1986]. This parameter has major effects on van der Waals forces [Abramowitz1986]. Generally, intermolecular separation distances are smaller in a solid than a liquid [Abramowitz1986].
Other physical properties can be used as descriptors for predicting melting point. Boiling point is a bulk property of molecular substances that is defined as the temperature at which the molecule can escape from the liquid and enter the vapor phase [Dearden2003]. Boiling point is a physical property that has been analyzed to determine its relationship with melting point [Abramowitz1990]. This property can be used in the prediction of melting point values [Abramowtiz1990]. Methods prove that boiling point can be estimated using additive constitutive properties, which are properties that can be added based on individual groups or atoms and that are dependent on structure [Simamora1993, Abramowitz1990]. Melting point prediction requires both additive constitutive properties and non-additive non-constituted properties, such as rotational symmetry [Simamora1993]. Therefore, boiling point can be used for the prediction of melting point as an additive constitutive property [Simamora1993, Abramowitz1990].
Development and Analysis of QSPR Models
QSPR, or quantitative structure-property relationships, are structure-based models that have been developed to overcome the problems and difficulties with group contribution methods [Godavarthy2006]. QSPR models are developed by linear models and descriptors that relate physical and chemical properties to molecular structures [Godavarthy2006, Katritzky2010]. There are two major aspects of QSPR modeling: representation of chemical parameters and statistical analysis [Katritzky2010]. To develop a QSPR model, a training set must be established, the molecular structure must be defined, descriptors must be generated and analyzed for importance, and then a linear model must be created and analyzed [Katritzky2010]. A training set is a random selection of a certain percent of the original test set that is used to generate descriptors [Breiman2001, Katritzky2010].There are several methods and software for developing and analyzing descriptors. One software program that is used extensively is CODESSA [Katritzky2002A, Godavarthy2006, Katritzky1997, Katritzky2002B]. CODESSA is a program that can provide correlations and understanding about important descriptors that affect physical properties, such as melting point [Katritzky2002]. The program uses the best multilinear regression method, or BMLR, to find the best multilinear correlations [Katritzky2002A]. It also eliminates descriptors that are not available for some molecules [Katritzky2002A]. Using this method, the descriptors are ordered by decreasing correlation coefficient and then, the top descriptors are correlated with the remaining descriptors to determine the best pairs of descriptors [Katritzky2002A). Other software programs that are used for QSPR modeling are AMPAC, DRAGON, POLLY, ADAPT, OASIS, Chem-X, Tsar, QSAR-Model [Godavarthy2006, Katritzky2010, Trohalaki2005, Dearden2003].
Another useful method for developing and analyzing descriptors is a Random Forest, which is a method of classification and regression [Palmer2007]. This method is very useful for QSPR modeling because it has high accuracy for prediction, it has a method of descriptor selection, and it has a method for determining the significance of each descriptor for the data set [Palmer2007]. The Random Forest method is based on growing an ensemble of decision trees and having them evaluate the significance of descriptors [Breiman2001]. The ensemble of trees is created by random vectors that control the growth of each tree [Breiman2001]. Below is an example of a random forest:
Figure 1: An example of a Random Forest plot [Breiman2001]
The graph shows how the Random Forest sorts variables, or descriptors, in order of importance [Breiman2001]. The most important descriptors for the data set are in the upper right corner of the graph [Breiman2001]. This method is useful for determining the most important descriptors and then, the descriptors can be applied to QSPR modeling [Palmer2007].
After the descriptors are created and selected, a QSPR model is developed and then statistically analyzed [Katritzky2010]. Statistical analysis must be applied to a QSPR melting point model to determine the results. To determine the accuracy of the model, a plot of predicted melting point values versus experimental melting points can be used [Marcus1998]. The best-fit linear model on this plot has a correlation, or r-value, which is typically reported as R^2 [Katritzky2010]. This regression line determines the coefficients of the descriptors used for the model [Katritzky2010]. An example of this type of plot is depicted below [Katritzky2010]:
Figure 2: Predicted Melting Point vs. Experimental Melting Point Plot [Katritzky2010]
The accuracy of a QSPR model is quantified by root mean square error (RMSE/RMSEP), residual standard deviation (RSD), or predictive squared correlation coefficient (Q^2) [Katritzky2010]. Average error, or AAE is also a statistic that can be used [Katritzky2010]. The predictive ability of a QSPR model is typically quantified by RMSEP [Faber1999]. There are two methods for estimating RMSEP, including the approximate sampling distribution of mean square and error propagation [Faber1999]. These statistical analyses are an essential aspect of QSPR modeling because it is part of the development and validation of the model [Katritzky2010].
QSPR Models
QSPR models are useful for melting point prediction and analysis and have been used by various researchers to develop melting point models for benzene derivatives [Abramowitz1990, Martin1979, Katritzky1997, Dearden1988]. Abramowitz and Yalkowsky used QSPR as a method for testing the influence of additive constitutive properties on predicting melting point by comparing the relationship between melting point (in Kelvin) and boiling point (in Kelvin) using the equation [Abramowitz1990]:MP = 0.848*BP – 105.7 (Eq.5)
When applying this equation to 85 benzene derivatives, there was a correlation, or r-value, of 0.763 [Abramowitz1990]. This low correlation proves that melting point is influenced by more than just additive constitutive properties [Abramowitz1990].
After proving that melting point is influenced by non-additive non-constitutive properties, Abramowitz and Yalkowsky apply different descriptors to their data set using QSPR modeling [Simamora1993, Abramowitz1990]. Rotational symmetry is a non-additive non-constitutive property and therefore, Abramowitz and Yalkowsky use SIGMALabramowitz along with the additive constitutive property, boiling point, to predict melting point with the equation [Simamora1993, Abramowitz1990]:
MP = 0.952*BP + 113.0*SIGMALabramowitz – 206.2 (Eq.6)
When applying this equation to their data set, there was a correlation of 0.908 [Abramowitz1990]. This r-value proves the importance of SIGMALabramowitz for estimating melting point.
Abramowitz and Yalkowsky also tested a four-parameter equation to estimate melting point values of their data set [Abramowitz1990]. These parameters include boiling point, SIGMALabramowitz, EXPAN, and ORTHO [Abramowitz1990]. The equation they developed is [Abramowitz1990]:
MP = 0.772*BP + 110.8*SIGMALabramowitz +11.56*ORTHO + 31.9*EXPAN – 234.4 (Eq.7)
This equation had a correlation of 0.938 on their data set [Abramowitz1990]. This correlation is slightly better than the previous correlation with the two-parameter equation [Abramowitz1990].
Abramowitz and Yalkowsky tested an equation that only included non-additive non-constitutive properties, SIGMALabramowitz, ORTHO, and EXPAN, and not the additive constitutive property, boiling point, to predict melting point [Abramowitz90]. By testing an equation with only non-additive non-constitutive properties, Abramowitz and Yalkowsky were testing the importance of additive constitutive properties for estimating melting point [Abramowitz1990]. The equation with these descriptors is [Abramowitz1990]:
MP = 86.5*SIGMALabramowitz + 29.8*ORTHO + 63.0*EXPAN + 20.5 (Eq.8)
The equation had a correlation of 0.767 on their data set [Abramowitz1990]. The low correlation proves the significance of both non-additive non-constitutive properties and additive constitutive properties for melting point estimation of benzene derivatives [Abramowitz1990].
The melting point models of Abramowitz and Yalkowsky are examples of QSPR. Several other methods using QSPR have been developed and tested for melting point prediction. One such melting point model is a model developed by Martin, Yalkowsky, and Wells [Martin1979]. These authors developed two models that could predict melting point in Celcius using the descriptors, sigma(martin), alpha, QM, HNO, IHB, and DM [Martin1979]. One equation they developed is [Martin1979]:
MP = 29.7*sigma(martin) + 28.7*alpha + 5.4*QM + 41.8*HNO – 28.5*IHB – 10.2*DM – 492 (Eq.9)
This equation has a correlation, or r-value, of 0.892 for their data set consisting of 84 disubstituted benzene derivatives [Martin1979]. Another equation they developed only uses the descriptors, sigma(martin), alpha, QM, and HNO [Martin1979]:
MP = 36.9*sigma(martin) + 30.2*alpha + 4.07*QM + 40.4*HNO – 535 (Eq.10)
This equation has a correlation of 0.882 for the same data set [Martin1979]. Therefore, the six-parameter equation has only a slightly better correlation than the four-parameter equation [Martin1979].
Katritzky, Maran, Karelson, and Labanov developed several melting point models for benzene derivatives using QSPR [Katritzky1997]. The first model is a nine-parameter equation for a data set of 443 substituted benzenes, including 131 ortho-substituted benzenes, 124 meta-substituted benzenes, and 155 para-substituted benzenes [Katritzky1997]. This model uses descriptors, such as HDSA2, av valency of H atom, TMSA, first order av structural information content, second order av information content, maximum total interaction for a C-H bond, av nucleophile reaction index for a C atom, Beta polarizibility, and sigma(martin) [Katritzky1997]. This model had a correlation, or r-value of 0.915 [Katritzky1997]. Three additional six-parameter models were created the ortho-substituted, meta-substituted, and para-substituted subsets, respectively [Katritzky1997]. The ortho-substituted benzene data set includes descriptors, such as HDSA2, ALFA polarizability, av valency of H atom, maximum total interaction for a C-H bond, ZX shadow, and maximum partial charge for a O atom and the model has an r-value of 0.922 for the data set [Katritzky1997]. The meta-substituted benzene data set includes descriptors, such as av valency of H atom, av nucleophile reaction index for C atom, HDSA2, sigma(martin), LUMO energy, and DPSA2 difference in CPSAs [Katritzky1997]. The correlation for this model was 0.923 for the data set [Katritzky1997]. The para-substituted benzene data set uses descriptors, such as XY shadow, new H-acceptors PSA, minimum e-e repulsion for H atom, HDSA2, second order av information content, and av valency H atom and the correlation for this model was 0.929 for the data set [Katritzky1997]. The researchers used software to determine which descriptors have the highest significance and are most useful for obtaining a high correlation value for the data sets [Katritzky1997]. It would be difficult to replicate this model because software is required to obtain values for many of these descriptors [Katritzky1997].
Another QSPR model is a model by Dearden and Rahman [Dearden1988]. This model is a seven-parameter equation that was created based on a data set of 43 anilines [Dearden1988]. The equation uses descriptors, such as hydrogen bond donor ability (HD), Swain-Lunton field parameter (F), third order valence-corrected molecular connectivity, hydrogen bond acceptor ability (HA), p-substitution (IP), Swain-Lunton resonance parameter (R), and Sterimol length parameter (L) [Dearden1988]. This model resulted in a 0.94 correlation for the data set and when two outliers were removed, the new model had a correlation of 0.961 [Dearden1988].
Based on the various QSPR models, there are several obvious problems with this method of modeling. Each model has a high correlation for the data set it was created from, but none of these models were replicated or extended to bigger data sets. Also, many of the models are very specific for a certain group of organic molecules. There are models for only benzene derivatives, only ortho-substituted benzene derivatives, only meta-substituted benzene derivatives, only para-substituted benzene derivatives, and only anilines [Abramowitz1990, Dearden1988, Martin1979, Katritzky1997]. Even when looking beyond benzene derivatives, the models are very specific for data sets. There is a model that applies to only hydrocarbons [Tsakalotos1906] and a model that only applies to alkanes [Burch2004]. These problems have affected the overall progress in research of melting point models. Another problem with QSPR modeling is that it requires additional software or methods for molecular screening [Godavarthy2006]. Although this model is useful, it has some flaws.
Conclusion
QSPR modeling is a useful method for creating and analyzing melting point models of benzene derivatives. In the last twenty to thirty years, there have been advancements in melting point modeling, but researchers have all used different descriptors and methods to develop results. Comparison analyses of different melting point models of benzene derivatives could be very useful for further advancements in melting point modeling. Based on the significance of molecular symmetry and intermolecular forces on melting point, further research and analysis should also be done on these relationships.References
Abramowitz, R. Estimation of the Melting Point of Rigid Organic Compounds. ProQuest Dissertations and Theses. 225, (1986). [DOI]Abramowitz, R., Yalkowsky, S.H. Melting Point, Boiling Point, and Symmetry. Pharmaceutical Research. 7(9), 942-947 (1990). [DOI]
Breiman, L. Random Forests. Machine Learning. 45(1), 5-32 (2001). [DOI]
Brown, R.J.C., Brown, R.F.C. Melting Point and Molecular Symmetry. Journal of Chemical Education. 77(6), 724 (2000). [DOI]
Burch, K.J., Whitehead, E.G. Melting-Point Models of Alkanes. Journal of Chemical Engineering Data. 49(4), 858-863 (2004). [DOI]
Dannenfelser, R.M., Surendran, N., Yalkowsky, S.H. Molecular Symmetry and Related Properties. Sar and QSAR in Environmental Research. 1(4), 273-292 (1993). [DOI] [Password Protected Zip]
Dearden, J.C., Quantitative Structure-Property Relationships for Prediction of Boiling Point, Vapor Pressure, and Melting Point. Environmental Toxicology and Chemistry. 22(8), 1696-1709 (2003). [DOI]
Dearden, J.C., Rahman, M.H. QSAR Approach to the Prediction of Melting Points of Substituted Anilines. Mathematical and Computer Modelling. 11(1), 843-846 (1988). [DOI]
Faber, N.M. Estimating the uncertainty in estimates of root mean square error of prediction: application to determining the size of an adequate test set in multivariate calibration. Chemometrics and Intelligent Laboratory Systems. 49(1), 79-89 (1999). [DOI]
Godavarthy, S.S., Robinson, R.L., Gasem, K.A.M. An Improved Structure-Property Model for Predicting Melting-Point Temperatures. Industrial & Engineering Chemistry Research. 45(14), 5117-5126 (2006). [DOI]
Katritzky, A.R., Jain, R., Lomaka, A., Petrukhin, R., Karelson, M., Visser, A.E., Rogers, R.D. Correlation of the Melting Points of Potential Ionic Liquids (Imidazolium Bromides and Benzimidazolium Bromides) Using the CODESSA Program. Journal of Chemical Information and Computer Sciences. 42(2), 225-231 (2002). [DOI]
Katritzky, A.R., Jain, R., Lomaka, A., Petrukhin, R., Maran, U., Karelson, M. Perspective on the Relationship between Melting Points and Chemical Structure. Crystal Growth & Design. 1(4), 261-265 (2001). [DOI]
Katritzky, A.R., Kuanar, M., Slavov, S., Hall, C.D., Karelson, M., Kahn, I., Dobchev, D. Quantitative Correlation of Physical and Chemical Properties with Chemical Structure: Utility for Prediction. Chemical Reviews. 110(10), 5714-5789 (2010). [DOI]
Katritzky, A.R., Lomaka, A., Petrukhin, R., Jain, R., Karelson, M., Visser, A.E., Rogers, R.D. QSPR Correlation of the Melting Point for Pyridinium Bromides, Potential Ionic Liquids. Journal of Chemical Information and Computer Sciences. 42(1), 71-74 (2002). [DOI]
Katritzky, A.R., Maran, U., Karelson, M., Lobanov, V.S. Prediction of Melting Points for the Substituted Benzenes: A QSPR Approach. Journal of Chemical Information and Computer Sciences. 37(5), 913-919 (1997) [DOI]
Marcus, A.H., Elias, R.W. Some Useful Statistical Methods for Model Validation. Environmental Health Perspectives. 106(6), 1541-1550 (1998). [DOI] [Password Protected Zip]
Martin, E., Yalkowsky, S.H., Wells, J.E. Fusion of Disubstituted Benzenes. Journal of Pharmaceutical Sciences. 68(5), 565-568 (1979). [DOI]
Mishra, D.S., Yalkowsky, S.H. Estimation of Vapor Pressure of Some Organic Compounds. Industrial & Engineering Chemistry Research. 30(7), 1609-1612 (1991). [DOI]
Palmer, D.S., O’Boyle, N.M., Glen, R.C., Mitchell, J.B.O. Random Forest Models to Predict Aqueous Solubility. Journal of Chemical Information and Modeling. 47(1), 150-158 (2007). [DOI]
Simamora, P., Miller, A.H., Yalkowsky, S.H. Melting Point and Normal Boiling Point Correlations: Applications to Rigid Aromatic Compounds. Journal of Chemical Information and Modeling. 33(3), 437-440 (1993). [DOI]
Slovokhotov, Y.L., Batsanov, A.S., Howard, J.A.K. Molecular van der Waals symmetry affecting bulk properties of condensed phases: melting and boiling points. Structural Chemistry. 18(4), 477-491 (2007). [DOI]
Trohalaki, S., Pachter, R., Drake, G.W., Hawkins, T. Quantitative Structure-Property Relationships for Melting Points and Densities of Ionic Liquids. Energy & Fuels. 19(1), 279-284 (2005). [DOI]
Tsakalotos, D.E. Sur le point de fusion des hydrocarbures homologues du methane. Comptes rendus de l’Academie des Sciences. Paris. 143, 1235-1236 (1906). [DOI] [Password Protected Zip]